Shahrukh Mallick
CS 6670: Computer Vision
Project 1: Feature Detection and Matching
My Own Feature
Descriptor (pseudo-SIFT implementation)
Description
For my own feature, I decided to try and implement my own crude version of SIFT. Here’s how I implemented it. I used the points generated by the Harris Feature Detector as my points of interest. You also need to generate the gradient magnitude of each individual pixel. This involves similar procedures as done when calculating the Harris values, but an additional step to find the magnitude of the gradient values.
Next, you iterate over all points of interest and generate a 16x16 window around it, and break that window into 16 4x4 windows. For each smaller window, generate a radian histogram (which has 8 components). However, you ignore pixels whose gradient magnitude is below some threshold (tuning this is difficult). Once you’ve generated a histogram for each of the 16 windows, you can create a 128 object descriptor to be used for this feature. Once you’ve done this, you need to normalize each descriptor’s values. If you want more exact details, there’s plenty of documentation online or in our class notes about SIFT.
Reason for major design choices
There weren’t too many design choices of my own, since I was following the slides on SIFT implementation. I experimentally determined a threshold value for the gradient magnitude by testing out several values below the mean magnitude value of the image (In retrospect, it probably would’ve been wiser to determine the median magnitude value as a start point).
Performance
Provided below are several charts and tables on the performance of the three different descriptor types.
Figure 1 below shows the ROC curves of the three descriptors (+ SIFT) on the Yosemite pictures. As a reference SIFT is included. All three descriptors perform well (>87% AUC for all cases), with my own descriptor performing the highest (~95%) among the three descriptors. MOPs was second best, with the simple window descriptor coming in last. In all cases, the ratio test improved the AUC results.
Figure 1: Yosemite Roc Plot
AUC Values for
Yosemite
Simple Window + SSD: 0.889366
Simple Window + Ratio: 0.929238
MOPs + SSD: 0.878177
MOPs + Ratio: 0.949780
MyOwn + SSD: 0.928682
MyOwn + Ratio: 0.951060
Figure 2 shows the ROC curves on the graf pictures. As expected, the descriptors all performed more poorly on these images than on the Yosemite images. In this case, MOPs outperformed my own descriptor by a fairly large margin (~+10% improvement). The simple descriptor did not perform that well here either, but that’s to be expected, since graf changes the angle, and simple descriptor only handles translations.
Figure 2: Graf Roc Plot
AUC Values for
Graf
Simple Window + SSD: 0.571100
Simple Window + Ratio: 0.711916
MOPs + SSD: 0.765770
MOPs + Ratio: 0.860932
MyOwn + SSD: 0.662242
MyOwn + Ratio: 0.716466
Figure 3 shows an example plot of the threshold values on the MOPs descriptor on the two different images. Both images indicate there’s a good threshold value to use for matching. Similar plots can be made to determine optimal thresholds for the other two descriptors.
Figure 3: Threshold plot for Yosemite (left) and graf (right)
Harris Operator Results
(as requested)
Harris Operator
Results on Yosemite1.jpg
Harris Operator Results on graf image (img1.ppm)
Benchmark Results
SW = Simple Window descriptor
MY = My own descriptor
MOPS = Mops Descriptor
Bike |
Graf |
Leuven |
Wall |
|||||
|
Avg Error |
Avg AUC |
Avg Error |
Avg AUC |
Avg Error |
Avg AUC |
Avg Error |
Avg AUC |
SW + SSD |
525 |
61.25% |
270 |
50.43% |
401 |
30.29% |
336 |
47.76% |
SW + ratio |
525 |
62.15% |
270 |
50.19% |
401 |
48.35% |
336 |
55.84% |
MOPS + SSD |
529 |
52.70% |
294 |
51.77% |
310 |
68.04% |
361 |
63.19% |
MOPS + ratio |
529 |
57.30% |
294 |
59.15% |
310 |
68.09% |
361 |
62.73% |
MY + SSD |
491 |
49.31% |
264 |
53.01% |
302 |
55.59% |
303 |
54.52% |
MY + ratio |
491 |
55.04% |
264 |
53.31% |
302 |
60.60% |
303 |
58.63% |
Table 1: All error and AUC values averaged over images in that directory. Error is in units of pixels and AUC is area under curve.
Over all, the descriptors performed decently, but there’s still plenty of room for improvement. In almost all cases, the ratio test improved results, confirming it is the better metric to use.
The simple window’s strengths and weaknesses are most obvious. Since it only measures translational changes, it struggles on images where angles change. Hence, it performed best on bikes where the images don’t shift, but only blur. It struggled on wall and graf because of the change in the angle of the picture taken. A surprising result is how poorly it performed on the Leuven. This can partially be explained by the threshold during Harris feature selection. The threshold will discriminate greatly on the various luminous pictures, and the effect is very apparent on the results. A possible way to increase this would be to tune the threshold to see which works best for the simple window descriptor.
MOPs performed the best among the three images, and specifically performed best on the Leuven image set. This is likely because luminosity doesn’t affect MOPs as much since MOPs measures the gradient, which doesn’t change across the images. Normalizing the intensities in the MOPs descriptor likely helped in dealing directly with what the Leuven image set was testing against. MOPs also did well on the wall image set, likely highlighting that its measure radian angle helped in describing the features, since angle was the main thing being tested here. Compared to its performance on the previous two image sets, MOPs did not do that well on the bikes or graf image sets. Graf is a very difficult image set as the image warps a lot, and so there’s lots of room for mistakes. Poor performance on the bike is likely explained since blurring causes some loss of detail, and therefore, it’s harder for the MOPs descriptor to get strong gradients.
Lastly, my own descriptor did not do as well as I had hoped. Since it was modeled off of SIFT, I expected it to be robust, but it seemed to have similar trouble to the bikes and graf image sets as the MOPs descriptor did. However, it performed well on the Leuven and wall image sets for similar reasons as the MOPs.
Extra Credit
I was hoping my attempt at implementing SIFT would warrant some extra credit, but the results don’t seem to indicate the implementation was that successful. Points for effort?